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Abstract 

Recently, several researchers have found that cost-based sat- 
isficing search with A* often runs into problems. Although 
some "work arounds" have been proposed to ameliorate the 
problem, there has not been any concerted effort to pinpoint 
its origin. In this paper, we argue that the origins can be 
traced back to the wide variance in action costs that is ob- 
served in most planning domains. We show that such cost 
variance misleads A* search, and that this is no trifling de- 
tail or accidental phenomenon, but a systemic weakness of 
the very concept of "cost-based evaluation functions + sys- 
tematic search + combinatorial graphs". We show that satis- 
ficing search with sized-based evaluation functions is largely 
immune to this problem. 

1 Introduction 

Much of the scale-up, as well as the research focus, in the 
automated planning community in the recent years has been 
on satisficing planning. Unfortunately, there hasn't been 
a concomitant increase in our understanding of satisficing 
search. Too often, the "theory" of satisficing search defaults 
to doing A* with inadmissible heuristics. While removing 
the requirement of admissible heuristics certainly relaxes the 
guarantee of optimality, there is no implied guarantee of ef- 
ficiency. A combinatorial search can be seen to consist of 
two parts: a "discovery" part where the (optimal) solution 
is found and a "proof" part where the optimality of the so- 
lution is verified. While an optimizing search depends cru- 
cially on both these phases, a satisficing search is instead 
affected more directly by the discovery phase. Now, stan- 
dard A* search conflates the discovery and proof phases to- 
gether and terminates only when it picks the optimal path 
for expansion. By default, satisficing planners use the same 
search regime, but relax the admissibility requirement on the 
heuristics. This may not cause too much of a problem in do- 
mains with uniform action costs, but when actions can have 
non-uniform costs, the the optimal and second optimal so- 
lution can be arbitrarily apart in depth. Consequently, A* 
search with cost-based evaluation functions can be an arbi- 
trarily bad strategy for satisficing search, as it waits until the 
solution is both discovered and proved to be optimal. 

*An extended abstract of this paper appeared in the proceed- 
ings of SOCS 2010. This research is supported in part by ONR 
grants N00014-09-1- 0017 and N00014-07-1-1049, and the NSF 
grant IIS-0905672. 



To be more specific, consider a planning problem for 
which the cost-optimal and second-best solution to a prob- 
lem exist on 10 and 1000 unspecified actions. The optimal 
solution may be the larger one. How long should it take 
just to find the 10 action plan? How long should it take to 
prove (or disprove) its optimality? In general (presuming 
PSPACE/EXPSPACE ^ P): 

1. Discovery should require time exponential in, at most, 10. 

2. Proof should require time exponential in, at least, 1000. 

That is, in principle, the only way to (domain-independently) 
prove that the 10 action plan is better or worse than the 1000 
action one is to in fact go and discover the 1000 action plan. 
Thus, A* search with cost-based evaluation function will 
take time proportional to h^^^^ for either discovery or proof. 

Using both abstract and benchmark problems, we will 
demonstrate that this is a systematic weakness of any search 
that uses cost-based evaluation function. In particular, we 
shall see that if e is the smallest cost action (after all costs 
are normalized so the maximal cost action costs 1 unit), then 
the time taken to discover a depth d optimal solution will be 
67. If all actions have same cost, then e « 1 where as if 
the actions have significant cost variance, then e ^ 1. We 
shall see that for a variety of reasons, most real-world plan- 
ning domains do exhibit high cost variance, thus presenting 
an ''e-cost trap'' that forces any cost-based satisficing search 
to dig its own ( ^ deep) grave. 

Consequently, we argue that satisficing search should re- 
sist the temptation to directly use cost-based evaluation func- 
tions (i.e., / functions that return answers in cost units) 
even if they are interested in the quality (cost measure) of 
the resulting plan. We will consider two size-based branch- 
and-bound alternatives: the straightforward one which com- 
pletely ignores costs and sticks to a purely size-based evalua- 
tion function, and a more subtle one that uses a cost-sensitive 
size-based evaluation function (specifically, the heuristic re- 
turns the size of the cheapest cost path; see Section |2]l. We 
show that both of these outperform cost-based evaluation 
functions in the presence of e-cost traps, with the second one 
providing better quality plans (for the same run time limits) 
than the first in our empirical studies. 

While some of the problems with cost-based satisficing 
search have also been observed, in passing, by other re- 
searchers (e.g. ( jBenton et al. 2010[ jRichter and Westphal] 
20I0| l, and some work-arounds have been suggested, our 



main contribution is to bring to the fore its fundamental na- 
ture. The rest of the paper is organized as follows. In the 
next section, we present some preliminary notation to for- 
mally specify cost-based, size-based as well as cost-sensitive 
size-based search alternatives. Next, we present two abstract 
and fundamental search spaces, which demonstrate that cost- 
based evaluation functions are 'always' needlessly prone to 
such traps (Section [3]l. Section [4] strengthens the intuitions 
behind this analysis by viewing A* search as flooding topo- 
logical surfaces set up by evaluation functions. We will ar- 
gue that of all possible topological surfaces (i.e., evaluation 
functions) to choose for search, cost-based is the worst. In 
Section |5] we put all this ana lysis to empirical validation 
by experimenting with LAMA ( Richter and Westphal 2010) l 
and SapaReplan. The experiments do show that size-based 
alternatives out-perform cost-based search. Modern plan- 
ners such as LAMA use a plethora of improvements beyond 
vanilla A* search, and in the appendix we provide a deeper 
analysis on which extensions of LAMA seem to help it mask 
(but not fully overcome) the pernicious effects of cost-based 
evaluation functions. 

2 Setup and Notation 

We gear the problem set up to be in line with the prevalent 
view of state-space search in modern, state-of-the-art satis- 
ficing planners. First, we assume the current popular ap- 
proach of reducing planning to graph search. That is, plan- 
ners typically model the state-space in a causal direction, 
so the problem becomes one of extracting paths, meaning 
plans do not need to be stored in each state. More im- 
portant is that the structure of the graph is given implic- 
itly by a procedure F, the child generator, with T{v) re- 
turning the local subgraph leaving v; i.e., V{v) computes 
the subgraph {N+[v],E{{v}, V - v)) = {{u \vue E} + 
V, {vu I vu S E}) along with all associated labels, weights, 
and so forth. That is, our analysis depends on the assumption 
that an implicit representation of the graph is the only com- 
putationally feasible representation, a common requirement 
for analyzing the A* family of algorithms (Hart, Nilsson,^ 
land RapliaelT968||Dechter and Pearl 1985] l. 

The search problem is to find a path from an initial state, 
i, to some goal state in Q. Let costs be represented as 
edge weights, say c{uv) is the cost of an edge from u to 
V. Let g* [v) be the (optimal) cost-to-reach v (from i), and 
be the (optimal) cost-to-go from v (to the goal). Then 
/* [v) := 5* {v) + h*^ [v), the cost-through v, is the cost of the 
cheapest i-Q path passing through v. For discussing smallest 
solutions, let /* [v) denote the smallest i-Q path through v. 
It is also interesting to consider the size of the cheapest i-Q 
path passing through v, say f*{v). 

We define a search node n as equivalent to a path rep- 
resented as a linked list. In particular, we distinguish this 
from the state of n (its last vertex), n . v. We say n . a (for 
action) is the last edge of the path and n.p (for parent) is 
the subpath excluding n.a and n.v. With n.a an edge 
from V to u the function gdn) (g-cost) is just the recur- 
sive formulation of path cost: gdn) := gdn.p) + c{vu) 
(gdn) '■= if n is the trivial path). So g* [v) < gdn) for all 
i-v paths n, with equality for at least one of them. Similarly 
let gs{n) :~ gs{n-p) + 1 (initialized at 0), so that is an 
upper bound on g * . 



A goal is a target vertex where a plan may stop and be a 
valid solution. We fix a computed predicate Q (t) (a black- 
box) encoding the set of goal vertices. Let hdv), the heuris- 
tic, be a procedure to estimate hdv). We call he admissible 
if it is a guaranteed lower bound. Let hs{v) estimate the re- 
maining depth to the nearest goal, and let hs{v) estimate the 
remaining depth to the cheapest reachable goal. 

We focus on two different definitions of / (the evaluation 
function). Since we study cost-based planning, we consider 
fdn) ■— gc("-) + /ic('T' this is the (standard, cost- valued) 
evaluation function of A* : cheapest-completion-first. We 
compare this to fs{n) := gs{n) + hs{n.v), the canonical 
size-valued (or search distance) evaluation function, equiv- 
alent to fc under uniform weights. Any combination of g^ 
and he is cost-based; any combination of gs and hs is size- 
based (e.g., breadth-first search is size-based). The evalua- 
tion function fsin) := gs{n) + hs{n. v) is also size-valued, 
but cost-sensitive and preferable. 

BEST-FlRST-SEARCH(i, Q, F, he, EVALUATE) 
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initialize-searchO 
while open not empty 

n — open . remove 

s = n.v 

if BOUND-TEST() then continue 

if GOAL-TEST() then continue 

if DUPLICATE-TEST() then continue 

star — r(s) // Expand s 

for each edge a from s to a child s' in star 

n' = n{sas') II Extend the path n 



f = EVALUATE (n') 



open . add(n' , /) 
return best-known-plan 



II Optimality is proven. 



s ^ n.v 

n' RELAXED-SOLVE(s,tJ, . 



■) 



EVALUATE(n) 

// What is the best measure on paths, 7(), to use? 

1 
2 

3 / = j{nn') 
II With f — g + h the first variations to consider are: 
// 9 = gdn), h = gdn'), and 

// 3 = 9s[n), h = gs{n'). 

4 return f 

initialize-searchO 

1 open = empty priority queue 

2 closed = empty map from vertices to paths 

3 /+ = oo // An upper bound on /* (i) 

4 best-known-plan = NULL 

5 n = (i) 

6 / = EVALUATE (n) 

7 open .add{n, f) 

BOUND-TEST() 

// hd) must be a lower bound on /i* () 
1 return gdn) + hds) > /+ 



GOAL-TEST() 

1 if ^(s) then 

2 /+ - 9c{n) 

3 hest-known-plan — n 

4 report best- known- plan 

5 return true 

6 return false 

duplicate-testQ 

1 n' — closed .get(s) 

2 if n' not null then 

3 if gdn') < gdn) then 

4 return true 

// Need to re-expand s, eventually. 
// Doing nothing here is one strategy. 

5 closed .put {s,n) 

6 return false 

Pseudo-code for best-first branch-and-bound search of 
implicit graphs is shown above. It continues searching af- 
ter a solution is encountered and uses the current best solu- 
tion value to prune the search space (line[5]l. The search is 
performed on a graph implicitly represented by T, with the 
assumption being that the explicit graph is so large that it 
is better to invoke expensive heuristics (EVALUATE) during 
the search than it is to just compute the graph up front. The 
pseudo-code given for EVALUATE shows one particular ap- 
proach (solving relaxed problems) to automatically devising 
guidance; in that setting, the question considered by this pa- 
per is whether to measure the sizes or the costs of the two 
paths (n and n'). 

With respect to normalizing costs, we can let e := 
min„ c(a) ^.j^^^ ^ ^ ^.j^^ \east cost edge after normaUzing 

costs by the maximum cost (to bring costs into the range 
[0, 1]). We use the symbol e for this ratio as we anticipate 
actions with high cost variance in real world planning prob- 
lems. For example: boarding versus flying (ZenoTravel), 
mode-switching versus machine operation (Job-Shop), and 
(unskilled) labor versus (precious) material cost. 

3 e-cost Trap: Two Canonical Cases 

In this section we argue that the mere presence of e-cost 
misleads cost-based search, and that this is no trifling detail 
or accidental phenomenon, but a systemic weakness of the 
very concept of "cost-based evaluation functions + system- 
atic search + combinatorial graphs". We base this analysis 
in two abstract search spaces, in order to demonstrate the 
fundamental nature of such traps. The first abstract space 
we consider is the simplest non-trivial (non-uniform cost) 
search space, the search space of a (large) cycle with one 
expensive edge. The second abstract space we consider is 
a more natural model of search (in planning), a uniform 
branching tree. Traps in these spaces are just exponentially 
sized and connected sets of e-cost edges: not the common 
result of a typical random model of search (sampling edges 
independently). We briefly consider why planning bench- 
marks naturally give rise to such structure. 

3.1 Cycle Trap 

In this section we consider the simplest abstract example 
of the £-cost 'trap', where applying increasingly powerful 



heuristics and domain analysis to ones search problem gives 
rise to an 'effective graph' — the graph for which Dijkstra's 
algorithm produces isomorphic behavior (In particular take 
/i = in this section.) Presumably such graphs have rather 
complex shape; but certainly complex graphs contain sim- 
ple graphs as subgraphs. So if there is a problem with search 
behavior in an exceedingly simple (non-uniformly weighted) 
graph then we can suppose that no amount of domain anal- 
ysis, learning, heuristics, and so forth, will incidentally ad- 
dress the problem: the inference must specifically address 
the issue of non-uniform weights. So we are arguing that e- 
cost is by itself a fundamental challenge to be overcome in 
planning: unsubsumed by other challenges. 

The state-space we will consider is the cycle, with an as- 
sociated exceedingly simple metric consisting of all uniform 
weights but for a single expensive edge. There are sev- 
eral other candidates for simple non-trivial state-spaces (e.g., 
cliques), but clearly the cycle is fundamental. Its search 
space is certainly the simplest non-trivial search space: the 
rooted tree on two leaves. So the single decision to be made 
is in which direction to traverse the cycle: clockwise or 
counter-clockwise. Formally: 

e-cost Trap: Consider the problem of making some counter, 
say X, on k bits contain one less than its maximum value 
{2^ — 2), starting from 0, using only the operations of in- 
crement and decrement. There are 2 minimal solutions: in- 
crementing 2^ — 2 times, or decrementing twice (exploiting 
overflow). Set the cost of incrementing and decrementing to 
1, except that overflow (in either direction) costs, say, 2^^^. 
Then the 2 minimal solutions cost 2*^ — 2 and 2^^^ + 1, or, 
normalized, 2(1 — e) and 1 + e. 

Cost-based search is the clear loser on this problem. 
While both approaches prove optimality in exponential time 
(0(2*^)), size-based discovers the optimal plan in constant 
time. Of course the goal 2^ — 2 is chosen to best illus- 
trate the trap. So consider the discovery problem for other 
goals: from 2'^[0, ^] cost-based search is twice as fast, from 
2*'[\, |] the performance gap narrows to break-even, and 
from 2''[|,1) the size-based approach takes the lead — by 
an enormous margin. Note that between 2'^[|, |] there is a 
trade-off: size-based finds a solution before cost-based, but 
cost-based finds the optimal solution first. (Of course, time 
till optimality is proven monotonically favors the cost-based 
approach: by a factor of 2 in the region 2'^'[0, \], by a factor 
of 1 in the region 2^[\, 1), and by 1 < (i + 2a)-^ < 2 for 
goals of the form 2'^(i + a).) 

Then, even across all goals, cost-based search is still quite 
inferior: the margins of victory either way are extremely lop- 
sided. To illustrate, consider 'large' k, say, k = 1000. Even 
the most patient reader will have forcibly terminated either 
search long before receiving any useful output — except if 
the goal is of the form ± f{k) for some sub-exponential 
f{k). Both approaches discover and prove the optimal so- 
lution in the positive case in time 0{f{k)) (with size-based 
performing twice as much work). In the negative case, only 
the size-based approach manages to discover a solution (in 
time 0(/(fc))) before being killed. Moreover, while it will 
fail to produce a proof before death, we, based on superior 
understanding of the domain, can show it to be posthumously 



correct (and have: 2*^ — f{k) > 2*^ | for large k). 

In summary, cost-based search on the single-decision tree 
"only explores left". Hence the trap: There is no reason 
to suppose that one direction is much worse than another 
in very large, weighted, graphs just because the first step is 
quite expensive. 

3.2 Branching Trap 

In the counter problem the trap is not even combinatorial; 
the search problem consists of a single decision at the root, 
and the trap is just an exponentially deep path. Then it 
is abundantly clear that appending Towers of Hanoi to a 
planning benchmark, setting its actions at e-cost, will kill 
cost-based search — even given the perfect heuristic for the 
puzzle! Besides Hanoi, though, exponentially deep paths 
are not typical of planning benchmarks. So in this section 
we demonstrate that exponentially large subtrees on e-cost 
edges are also traps. 

Consider a; > 1 high cost actions and y > 1 low 
cost actions in a uniform branching tree model of search 
space. (A typical model for analysis, appropriate up to 
the point where duplic ate state checking becomes signifi- 
cant. See (Pearl 1984 1 for similar analysis on more com- 
plex models of search.) Suppose the solution of interest 
costs C, in normalized units, so the solution lies at depth 
C or greater Then cost-based search faces a grave situation: 
0{{x + y^)'^) possibilities will be explored before consid- 
ering all potential solutions of cost C. 

A size-based search only ever considers at most 0{{x + 
yY) = 0{}/') possibilities before consideration of all poten- 
tial solutions of size d\ of course the more interesting ques- 
tion is how long it takes to find solutions of fixed cost rather 
than fixed depth — note ^ > d> C . Assuming the high cost 
actions are relevant, that is, some number of them are needed 
by solutions, then we have that solutions are not actually hid- 
den as deep as ^- Suppose, for example, that solutions tend 
to be a mix of high and low cost actions in equal proportion. 
Then the depth of those solutions with cost C is d = 2 
(| • 1 + I • £ = C). At such depths the size-based approach 

is the clear winner: 0{{x + y)^) <C 0({x + y^)^) (nor- 
mally). Consider, say, ?/ = |, then: 

b^/{x + y~-\ <b~-/y~, 
< 2~/6TT+r^ 



and, provided e < \ (f^J" & = 4, e < i), the last is 

always less than 1 and, for that matter, goes, quickly, to as 
C increases and/or h increases and/or e decreases. 

Generalizing, the size-based approach is faster at finding 
solutions of any given cost, as long as (1) high-cost actions 
constitute at least some constant fraction of the solutions 
considered, (2) the ratio between high-cost and low-cost is 
sufficiently large, (3) the effective search graph (post addi- 
tional inference) is reasonably well modeled by an infinite 
uniform branching tree (i.e., huge), and (4) the search is sys- 
tematic. 



4 Search Effort as Flooding Topological 
Surfaces of Evaluation Functions 

We view evaluation functions (/) as topological surfaces 
over search nodes, so that generated nodes are visited in, 
roughly, order of /-altitude. With non-monotone evaluation 
functions, the set of nodes visited before a given node is all 
those contained within some basin of the appropriate depth 
— picture water flowing from the initial state: if there are 
dams then such a flood could temporarily visit high altitude 
nodes before low altitude nodes. (With very inconsistent 
heuristics — large heuristic weights — the metaphor loses 
explanator y power, as there i s nowhere to go but downhill. 
See (Dechter and Pearl 1985[ ) for comprehensive details.) 

If we take a single point inside such a basin (but not one 
defining the brim) and alter its altitude over the entire range 
of that basin's depth, we will not have changed the set of 
states inundated prior to the brim. If there were no solutions 
prior to the brim, then we will not have altered any externally 
visible behavior of the search: Whenever best-first search 
finally finds a solution it will no longer have mattered how 
all the prior nodes were ordered. To illustrate, IDA* deserves 
its name, despite exploring the space in an entirely different 
order from A* in any given iteration. 

In particular, controlling the behavior of search by alter- 
ing the evaluation function is a very different proposition in 
the two contexts of local search and best-first search. For the 
latter, preventing exploration of some choice requires raising 
its altitude (or that of a cut-set) to past that of a solution of 
interest, actually, past the altitude of every cut-set separat- 
ing that solution from the initial vertex (the altitude of a set 
is the minimum over its elements), i.e., past the rim of the 
deepest basin preventing inundation of the solution. For the 
former, mitigating exploration is merely a matter of making 
the choice worse than its best sibling; the ideal amount of pe- 
nalization depends on the nature of randomization applied[^ 

Formally, but with narrower context: Consider an he that 
is derived by optimally solving relaxed problems, or just di- 
rectly sup pose that he is guaranteed to be admissible and 
consistent ( Pearl 1984) . Consider the altitude {f*{i)) of the 
cost-optimal solution In fc- All lower-altitude nodes com- 
prise the cost-optimal footprint. Exhausting the footprint is a 
proof, relative to he being admissible, of the purported opti- 
mality of the known solution (with he consistent, exhaustion 
is moreover necessary for proof by search). As the order of 
doing so does not affect correctness of the proof, there is sig- 
nificant freedom/futility (depending on your perspective) in 
the choice of evaluation function: Every systematic search 
is equivalent (does the same amount of total work) if h^ and 
f*{i) are given. When re-expansion is a significant possi- 
bility, then the appropriate statement is that the same set of 
states are expanded, some, hopefully few, more than once. It 
follows that performing two levels of search, the outer search 
taking guesses at /* (i), is a powerful idea (as in IDA*, or in 
the standard treatment of optimization problems as decision 
problems). 

That is, it is futile to attempt to expand less than A*, but, 
one is free to expand that set in any order For example, 
with an oracular guess of («), it is possible to terminate 



'The second best sibling could be second most likely to be cho- 
sen, but it could also be the least likely to be chosen. 



in equal time yet print the optimal solution sooner than A* : 
take the evaluation function to be —fc, so that the optimal 
solution is expanded as soon as it is generated, at which 
time, perhaps, the open list still contains some states with 
— /c(s) > —f*{i). Indeed, as the optimal solution is guar- 
anteed to be the last path expanded, up to tie-breaking, under 
the evaluation function /c, any (other) evaluation function 
(monotonically) improves upon the performance of A*, with 
respect to the problem of finding the optimal solution. 
Worst-case: The minimum gradient in g bounds the worst- 
case of the discovery problem: it puts a limit on the num- 
ber of search nodes that could conceivably be considered 
just as good as some solution of interest. For example, in 
uniformly branching trees the absolute worst-case bound is 

I max Vg 

b (with d the depth of the unique solution). Insisting 

on a fairer distribution of edge costs and/or considering non- 
zero heuristics (but still imperfect) lowers the bound, but not 

. max Vg 

asymptotically: still 0{b minvg ) many search nodes might 
be expanded before finding the solution (in the worst-case of 
a unique solution on d maximum cost actions). Other search 
models yield different bounding expressions, but all will be 
increasing functions of . Considering normalized 

representations then max V g is just 1, and so we have that fg 
enjoys the tightest bound, since min V^s = 1. In contrast, 
fc suffers from the 'loosest' bound, as min V^c = e ^ 1, 
in the sense that one presumably devotes bits to specifying 
costs (in binary), so one cannot do worse than exponentially 
small except by permitting zero costs. Taking worst-case for 
some specific / to mean a problem with maximum search 
nodes at every altitude, with a unique solution of maximum 
cost (given its size), then, for the discovery problem: (1) 
Size-based search achieves the asymptotically best-possible 
worst-case performance. (2) Cost-based search 'achieves' 
the asymptotically worif-possible worst-case performance. 

Note that all that is being said is that a malicious problem- 
setter has control of the metric, so any quality-sensitive 
search can be misdirected. 

Typical-case: Every choice of search topology will even- 
tually lead to identification of the optimal solution and ex- 
haustion of the cost-optimal footprint. Some will produce a 
whole slew of suboptimal solutions along the way, eventu- 
ally reaching a point where one begins to wonder if the most 
recently reported solution is optimal. Others report nothing 
until finishing. The former are interruptible, and are rather 
more desirable than the latter. That is, admissible cost-based 
topology is the worst possible choice: it is the least interrupt- 
ible. There is no point at which one can forcibly terminate 
and receive anything for ones investment of computational 
resources. Gaining interruptibility is a matter of raising the 
altitude of large portions of the footprint in exchange for 
lowering the altitude of a smaller set of non-footprint search 
nodes (leaving the solution of interest fixed). Note that there 
must be a trade-off (else one has devised a better heuristic): 
interruptibility comes at the expense of total work. 

With size-based topology, the large set is the set of longer 
yet cheaper plans, while the small set is the shorter yet 
costlier plans. In general one expects there to be many more 
longer plans than shorter plans in combinatorial problems. 



but that changes if the problem is hardest possible in finite 
spaces, i.e., all goal states are as far away as possible so 
that cheap solutions are also necessarily long. There is no 
reason to suppose that size-based topology is the best pos- 
sible trade-off; it just demonstrates existence of better ap- 
proaches than admissible cost-based topology. Inadmissible 
cost-based topology, such as WA*, can also demonstrate ex- 
istence of better approaches. 

Weighting the heuristic, though, magnifies depth-first be- 
havior, which is great up until finding a solution, but af- 
terwards leads to poor backtracking behavior. For exam- 
ple, depth-first bias in a non-uniformly weighted uniform 
branching tree permits catastrophic backtracking behavior: 
exhaustion of maximum size e-cost traps. (And tree models 
are better fits under depth-first bias, as state re-expansion is 
more likely due to finding better paths later) Dynamically 
weighting the heuristic is one approach (Pohl 1973) , attack- 
ing the contribution that non-uniform accuracy of heuris- 
tics has on such backtracking, one could also consider ran- 
domized restarts of WA* along with a decreasing sched- 
ule of weightsp] Employing multiple open lists (as in 
LAMA) is a different approach (than restarting) to permit- 
ting non-local backtracking; EES (Thayer and Ruml 2010) 
does so while also, unlike the preceding, explicitly con- 
sidering the further impact that non-uniform weights have, 
achieving an interesting blend of cost and size considera- 
tions. One could characterize it as cost-bounded size opti- 
mization; it is also interesting to consider reformulating EES 
as size-bounded cost optimization, particularly considering 



the behavior of GraphPla n/BlackBox (Blum and Furst 1995 
|Kautz and Selman 1999[ ) and relatives. 

5 e-cost Trap in Practice 

In this section we demonstrate existence of the problematic 
planner behavior in a realistic setting: running LAMA on 
problems in the travel domain (simplified ZenoTravel, zoom 
and fuel removed), as well as two other IPC domains. Anal- 
ysis of LAMA is complicated by many factors, so we also 
test the behavior of SapaReplan on simpler instances (but in 
all of ZenoTravel). The first set of problems concern a ren- 
dezvous at the center city in the location graph depicted in 
Figure [Tl the optimal plan arranges a rendezvous at the cen- 
ter cityT"he second set of problems is to swap the positions 
of passengers located at the endpoints of a chain of cities. 

5.1 LAMA 

In this section we demonstrate the performance problem 
wrought by g-cost in a state-of-th e-art (2008) planner — 
LAMA (Richter and Westphal 2010), th e leader of the cost 
sensiti ve (sati sficing) track of IPC'08 (Heknert, Do, and Re 
fanidis 2008 1. With a completely trivial recompilation (set 
a flag) one can make it ignore the given cost function, ef- 
fectively searching by fg. With slightly more work one can 
do better and have it use fs as its evaluation function, i.e., 
have the heuristic estimate d and the search be size-based, 
but still compute costs correctly for branch-and-bound. Call 



Besides a better lower bound. 



^ The possibility of state re-expansion greatly exacerbates poor 
backtracking behavior, so it is worthwhile to keep in mind that an 
iterated search need not re-expand states immediately. 




Figure 1: Rendezvous problems. Diagonal edges cost 7,000, 
exterior edges cost 10,000. Board/Debark cost 1. 



Domain 


LAMA 


LAMA-size 


Rendezvous 

Elevators 
Woodworking 


70.8% 
79.2% 
76.6% 


83.0% 
93.6% 
64.1% 



Table 1: IPC metric on LAMA variants. 



this latter modification LAMA-size. Ultimately, the obser- 
vation is that LAMA-size outperforms LAMA — no trivial 
feat, particularly for such a small change in implementation. 

LAM^I^defies analysis in a number of ways: landmarks, 
preferred operators, dynamic evaluation functions, multiple 
open lists, and delayed evaluation, all of which effect po- 
tential search plateaus in complex ways. Nonetheless, it is 
essentially a cost-based approach. 

Resultsj^ With more than about 8 total passengers, LAMA 
is unable to complete any search stage except the first (the 
greedy search). For the same problems, LAMA-size finds 
the same first plan (the heuristic values differ, but not the 
structure), but is then subsequently able to complete further 
stages of search. In so doing it sees marked improvement in 
cost; on the larger problems this is due only to finding bet- 
ter variants on the greedy plan. Other domains are included 
for broader perspective, woodworking in particular was cho- 
sen as a likely counter-example, as all the actions concern 
just one type of physical object and the costs are not wildly 
different. For the same reasons we would expect LAMA 
to out-perform LAMA-size in some cost-enhanced version 
of Blocksworld. For a compre hensive empirical analysis, 
see ( [Richter and Westphal 2010| l. 

5.2 SapaReplan 

We also consider the behavior of SapaReplan on the simpler 
set of problems]^ This planner is much less sophisticated 
in terms of its search than LAMA, in the sense of being 
much closer to a straight up implementation of weighted A* 
search. The problem is just to swap the locations of passen- 
gers located on either side of a chain of cities. A plane starts 
on each side, but there is no actual advantage to using more 
than one (for optimizing either of size or cost): the second 

''Options: 'fFlLi'. 

^New best plans for Elevators were found (largely by LAMA- 
size). The baseline planner's score is 71.8% against the better ref- 
erence plans. 

^Except that these problems are run on all of ZenoTravel. 



plane exists to confuse the planner Observe that smallest 
and cheapest plans are the same. So in some sense the con- 
cepts have become only superficially different; but this is 
just what makes the problem interesting, as despite this sim- 
ilarity, still the behavior of search is strongly affected by the 
nature of the evaluation function. We test the performance 
of fs and fc, as well as a hybrid evaluation function similar 

to fs + fc (with costs normalized). We also test hybridizing 
via tie-breaking conditions, which ought to have little effect 
given the rest of the search framework. 
ResultsQ The size-based evaluation functions find better 
cost plans faster (within the deadline) than cost-based evalu- 
ation functions. The hybrid evaluation function also does 
relatively well, but not as well as could be hoped. Tie- 
breaking has little effect, sometimes negative. 

We note that Richter and Westphal (2010) also report that 
replacing cost-based evaluation function with a pure size- 
based one improves performance over LAMA in multiple 
other domains. Our version of LAMA-Size uses a cost- 
sensitive size-based search, and our results, in the domains 
we investigated, seem to show bigger improvements over 
LAMA. 

Finally, while LAMA-size outperforms LAMA, our the- 
ory of £-cost traps suggests that cost-based search should fail 
even more spectacularly. In the appendix, we take a much 
closer look at one domain-the travel domain-and present a 
detailed study of which extensions of LAMA help it tem- 
porarily mask the pernicious effects of cost-based search. 
Our conclusion is that both LAMA and SapaReplan man- 
age to find solutions to problems in the travel domain despite 
the use of a cost-based evaluation function by using various 
tricks to induce a limited amount of depth-first behavior in 
an A* -framework. This has the potential effect of delaying 
exploration of the e-cost plateaus slightly, past the discovery 
of a solution, but still each planner is ultimately trapped by 
such plateaus before being able to find really good solutions. 
In other words, such tricks are mostly serving to mask the 
problems of cost-based search (and e-cost), as they merely 
delay failure by just enough that one can imagine that the 
planner is now effective (because it returns a solution where 
before it returned none). Using a size-based evaluation func- 
tion more directly addresses the existence of cost plateaus, 
and not surprisingly leads to improvement over the equiva- 
lent cost-based approach — even with LAMA. 

6 Conclusion 

The practice of combinatorial search in automated planning 
is satisficing. There is a great call for deeper theories of satis- 
ficing search, and one perhaps significant obstacle in the way 
of such research is the pervasive notion that perfect problem 
solvers are the ones giving only perfect solutions. Actually 
implementing cost-based, systematic, combinatorial, search 
reinforces this notion, and therein lies its greatest harm. 

'The results differ markedly between the 2 and 3 city sets of 
problems because the sub-optimal relaxed plan extraction in the 

2- cities problems coincidentally produces an essentially perfect 
heuristic in many of them. One should infer that the solutions found 
in the 2-cities problems are sharply bimodal in quality and that the 
meaning of the average is then significantly different than in the 

3- cities problems. 





2 Cities 


3 Cities 


Mode 


Score 


Rank 


Score 


Rank 


Hybrid 


88.8% 


1 


43.1% 


2 


Size 


83.4% 


2 


43.7% 


1 


Size, tie-break on cost 


82.1% 


3 


43.1% 


2 


Cost, tie-break on size 


77.8% 


4 


33.3% 


3 


Cost 


77.8% 


4 


33.3% 


3 



Table 2: IPC metric on SapaReplan variants in ZenoTravel. 



In support of the position we demonstrated the technical 
difficulties arising from such use of a cost-based evaluation 
function, largely by arguing that the size-based alternative 
is a notably more effective default strategy. We argued that 
using cost as the basis for plan evaluation is a purely ex- 
ploitative perspective, leading to least interruptible behavior. 
Being least interruptible, it follows that implementing cost- 
based search will typically be immediately harmful to that 
particular application. But regardless of whether the partic- 
ular instance demonstrates the rule or the exception, the last- 
ing harm is in reinforcing the wrong definition of satisficing 
search in the first place. In conclusion, as a rule; Cost-based 
search is harmful. 
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A Deeper Analysis of the Results in Travel 
Domain 

In this section we analyze the reported behavior of LAMA 
and SapaReplan in greater depth. We begin with a general 
analysis of the domain itself and the behavior of (simplistic) 



systematic state-space search upon it, concluding that cost- 
based methods suffer an enormous disadvantage. The empir- 
ical results are not nearly so dramatic as the dire predictions 
of the theory, or at least do not appear so. We consider to 
what extent the various additional techniques of the planners 
(violating the assumptions of the theory) in fact mitigate the 
pitfalls of e-cost, and to what extent these only serve to mask 
the difficulty. 

A.l Analysis of Travel Domain 

We argue that search under fc pays a steep price in time and 
memory relative to search under fg. The crux of the matter is 
that the domain is reversible, so relaxation-based heuristics 
cannot penalize fruitless or even counter-productive passen- 
ger movements by more than the edge-weight of that move- 
ment. Then plateaus in g are plateaus in /, and the plateaus 
in cjc are enormous. 

First note that the domain has a convenient structure: The 
global state space is the product of the state space of shuf- 
fling planes around between cities/airports via the fly ac- 
tion (expensive), and the state space of shuffling people 
around between (stationary) planes and cities/airports via 
the board/debark actions (cheap). For example, in the ren- 
dezvous problems, there are 5^ = 625 possible assignments 
of planes to cities, and (5 + 4)^*^ possible assignments of 
passengers to locations (planes + cities), so that the global 
state space has exactly 5^ • 9^*^ reachable states (with k the 
number of passengers at one of the origins)]^ 

Boarding and debarking passengers is extremely cheap, 
say on the order of cents, while flying planes between cities 
is quite a bit more expensive, say on the order of hundreds of 
dollars (from the perspective of passengers). So ^ « 10000 
for this domain — a constant, but much too large to ignore. 

To analyze state-space approaches in greater depth let 
us make all of the following additional assumptions: The 
heuristic is relaxation-based, imperfect, and in particular 
heuristic error is due to the omission of actions from re- 
laxed solutions relative to real solutions. Heuristic error is 
not biased in favor of less error in estimation of needed fly 
actions — in this problem planes are mobiles and containers 
whereas people are only mobiles. Finally, there are signifi- 
cantly but not overwhelmingly more passengers than planes. 

Then consider a child node, in plane-space, that is in fact 
the correct continuation of its parent, but the heuristic fails 
to realize it. So its / is higher by the cost or size of one 
plane movement: 1 under normalized costs. Moreover as- 
sume that moving passengers is not heuristically good (in 
this particular subspace). (Indeed, moving passengers is usu- 
ally a bad idea.) Then moving a passenger increases fc by 
at most 2e (and at least e), once for gc and once for he. 
As ^ « 5000 we have that search under fc explores the 
passenger-shuffling space of the parent to, at least, depth 
5000. Should the total heuristic error in fact exceed one fly 
action, then each such omission will induce backtracking to 
a further 5000 levels: for any search node n reached by a fly 
action set ec{n) ~ fc{x) — fc{n) with x some solution of in- 
terest (set Cs similarly). Then if search node n ever appears 
on the open list it will have its passenger-shuffling subspace 

*Fuel and zoom are distracting aspects of ZenoTravel-STRIPS, 
so we remove them. Clever domain analysis could do the same. 



explored, under fc, to at least depth Cc • 5000 before x is 
found (and at most depth Cc • j). Under fg, we have instead 
exploration up to at least depth • ^ and at most depth • j . 

As 5000 objects is already far above the capabilities of 
any current domain-independent planners, we can say that 
at most plane-shuffling states considered, cost-based search 
exhausts the entire associated passenger-shuffling space dur- 
ing backtracking. That is, it stops exploring the space due to 
exhausting finite possibilities, rather than by adding up suf- 
ficiently many instances of 2e increases in / — the result is 
the same as if the cost of passenger movement was 0. Worse, 
such exhaustion commences immediately upon backtracking 
for the first time (with admissible heuristics). Unless very 
inadmissible (large heuristic weights), then even with inad- 
missible heuristics, still systematic search should easily get 
trapped on cost plateaus — before finding a solution. 

In contrast, size-based search will be exhausting only 
those passenger assignments differing in at most Bs values; 
in the worst case this is equivalent to the cost-based method, 
but for good heuristics is a notable improvement. (In ad- 
dition the size-based search will be exploring the plane- 
shuffling space deeper, but that space is [assumed to be] 
much smaller than any single passenger-shuffling space.) 
Then it is likely the case that cost-based search dies before 
reporting a solution while size-based search manages to find 
one or more. 

A.2 Analyzing LAMA's Performance 

While LAMA-size out-performs LAMA, it is hardly as dra- 
matic a difference as predicted above. Here we analyze the 
results in greater depth, in an attempt to understand how 
LAMA avoids being immediately trapped by the passenger- 
shuffling spaces. Our best, but not intuitive, explanation is 
its pessimistic delayed evaluation leads to a temporary sort 
of depth-first bias, allowing it to skip exhaustion of many of 
the passenger-shuffling spaces until after finding a solution. 
So, (quite) roughly, LAMA is able to find one solution, but 
not two. 

Landmarks. The passenger-shuffling subspaces are search 
plateaus, so, the most immediate hypothesis is that LAMA's 
use of landmarks helps it realize the futility of large portions 
of such plateaus (i.e., by pruning them). However, LAMA 
uses landmarks only as a heuristic, and in particular uses 
them to order an additional (also cost-based) open list (tak- 
ing every other expansion from that list), and the end result is 
actually greater breadth of exploration, not greater pruning. 
Multiple Open Lists. Then an alternative hypothesis is that 
LAMA avoids immediate death by virtue of this additional 
exploration, i.e., one open list may be stuck on an enor- 
mous search plateau, but if the other still has guidance then 
potentially LAMA can find solutions due to the secondary 
list. In fact, the lists interact in a complex way so that con- 
ceivably the multiple-list approach even allows LAMA to 
'tunnel' out of search plateaus (in either list, so long as the 
search plateaus do not coincide). Indeed the secondary list 
improves performance, but turning it off still does not cripple 
LAMA, let alone outright kiU it. 

Small Instances. It is illuminating to consider the behavior 
of LAMA and LAMA-size with only 4 passengers total; here 
the problem is small enough that optimality can be proved. 



LAMA-size terminates in about 12 minutes. LAMA termi- 
nates in about 14.5 minutes. Of course the vast majority of 
time is spent in the last iteration (with heuristic weight 1 
and all actions considered) — and both are unrolling the ex- 
act same portion of state space (which is partially verifiable 
by noting that it reports the same number of unique states in 
both modes). There is only one way that such a result is at all 
possible: the cost-based search is re-expanding many more 
states. That is difficult to beUeve; if anything it is the size- 
based approach that should be finding a greater number of 
suboptimal paths before hitting upon the cheapest. The ex- 
planation is two-fold. First of all pessimistic delayed evalua- 
tion leads to a curious sort of depth-first behavior Secondly, 
cost-based search pays far more dearly for failing to find the 
cheapest path first. 

Delayed Evaluation. LAMA's delayed evaluation is not 
equivalent to just pushing the original search evaluation 
function down one level. This is because it is the heuristic 
which is delayed, not the full evaluation function. LAMA's 
evaluation function is the sum of the parent's heuristic on 
cost-to-go and the child's cost-to-reach: /^(n) = g{n) + 
h{n.p.v). One can view this technique, then, as a transfor- 
mation of the original heuristic. Crucially, the technique in- 
creases the inconsistency of the heuristic. Consider an opti- 
mal path and the perfect heuristic. Under delayed evaluation 
of the perfect heuristic, each sub-path has an -value in ex- 
cess of /* by exactly the cost of the last edge. So a high 
cost edge followed by a low cost edge demonstrates the non- 
monotonicity of induced by the inconsistency wrought by 
delayed evaluation. The problem with non-monotonic eval- 
uation functions is not the decreases per se, but the increases 
that precede them. In this case, a low cost edge followed by 
a high cost edge along an optimal path induces backtracking 
despite the perfection of the heuristic prior to being delayed. 

Depth-first Bias. Consider some parent n and two children 

X and y (x.p = n, y.p = n) with x reached by some cheap 
action and y reached by some expensive action. Observe that 
siblings are always expanded in order of their cost-to-reach 
(as they share the same heuristic value), so x is expanded 
before y. Now, delaying evaluation of the heuristic was pes- 
simistic: h{x.v) was taken to be h{n.v), so that it appears 
that X makes no progress relative to n. Suppose the pes- 
simism was unwarranted, for argument's sake, say entirely 
unwarranted: h{x.v) = h{n.v) — c{x.o). Then consider a 
cheap child of x, say w. We have: 

fL{w)=g{w) + h{x.v), (1) 
= g{x) + c{w.o) + h{n.v) — c(a;.o), (2) 
= fL{x)-c{x.o) + c{w.o), (3) 
= /(n) + c(w.o), (4) 

(5) 



so in particular, fhiw) < fhiv) because /(n) -f c{w.o) < 
f{n) + c{y.o). Again suppose that w makes full progress 
towards the goal (the pessimism was entirely unwarranted), 
so h{w.v) — h{x.v) — c{w.o). So any of its cheap children. 



say z, satisfies: 

/l(2) = g{w) + c(z.o) + hix.v) — c{w.o), (6) 
= - c{w.o) + c(z.o), (7) 

= fhix) — c{x.o) + c{'w.o) — c{w.o) + c(z.o), (8) 
= fL{x)-c{x.o) + c{z.o), (9) 
= /H+c(z.o). (10) 

Inductively, any low-cost-reachable descendant, say x' , that 
makes full heuristic progress, has an value of the form 
J [n) + c{x' .o) , and in particular, fhix') < fhiv)^ that is, all 
such descendants are expanded prior to y. Generalizing, any 
low-cost-reachable and not heuristically bad descendant of x 
is expanded prior to y (where the bound on heuristic badness 
is c{y.o) — the amount by which y is pessimistically con- 
sidered heuristically bad). Once y itself is finally expanded, 
then its descendants can compete with the descendants of x 
on even footing, so in particular some of the expensive ex- 
its of the low-cost subspace underneath y may very well be 
explored prior to some of the expensive (heuristically or im- 
mediately) exits of the low-cost subspace underneath x — 
in contrast with the low-cost subspaces themselves, which 
were explored in depth-first fashion, i.e., all of x's subspace 
before all of y's subspace. 

Then LAMA exhibits a curious, temporary, depth-first be- 
havior initially, but in the large exhibits the normal breadth- 
first bias of systematic search. Depth-first behavior certainly 
results in finding an increasingly good sequence of plans to 
the same state: At every point in the best plan to some state 
where a less-expensive sibling leads to a slightly worse plan 
to the same state is a point at which depth-first behavior finds 
worse plans first. The travel domain is very strongly con- 
nected, so there are many such opportunities. 
Overhead. Consider two paths to the same plane-shuffling 
state, the second one actually (but not heuristically) bet- 
ter. Then LAMA has already expanded the vast majority, 
if not the entirety, of the associated passenger-shuffling sub- 
space before finding the second plan. That entire set is 
then re-expanded. The size-based approach is not com- 
pelled to exhaust the passenger-shuffling subspaces in the 
first place (indeed, it is compelled to backtrack to other pos- 
sibilities), and so in the same situation ends up performing 
less re-expansion work within each passenger-shuffling sub- 
space. Then even if the size-based approach is overall mak- 
ing more mistakes in its use of planes (finding worse plans 
first), which is to be expected, the price per such mistake is 
notably smaller 
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LAMA 


LAMA-size 


Rendezvous 

Elevators 
Woodworking 


70.8% 
79.2% 
76.6% 


83.0% 
93.6% 
64.1% 



Table 3: IPC metric on LAMA variants. 



Results^ With more than about 8 total passengers, LAMA 
is unable to complete any search stage except the first (the 

'New best plans for Elevators were found (largely by LAMA- 
size). The baseline planner's score is 71.8% against the better ref- 
erence plans. 



greedy search). For the same problems, LAMA-size finds 
the same first plan (the heuristic values differ, but not the 
structure), but is then subsequently able to complete further 
stages of search. In so doing it sees marked improvement in 
cost; on the larger problems this is due only to finding bet- 
ter variants on the greedy plan. Other domains are included 
for broader perspective, woodworking in particular was cho- 
sen as a likely counter-example, as all the actions concern 
just one type of physical object and the costs are not wildly 
different. For the same reasons we would expect LAMA 
to out-perform LAMA-size in some cost-enhanced version 
of Blocksworld. For a compre hensive empirical analysis, 
see ( |Richter and Westphal 2010| ). 

Summary. LAMA is out-performed by LAMA-size, due to 
the former spending far too much time expanding and re- 
expanding states in the e-cost plateaus. It fails in "depth- 
first" mode: finding not-cheapest almost-solutions, exhaust- 
ing the associated cheap subspace, backtracking, finding a 
better path to the same state, re-exhausting that subspace, 
. . . , in particular exhausting memory extremely slowly (it 
spends all of its time re-exhausting the same subspaces). 

A.3 Analyzing the Performance of SapaReplan 

The contrasting failure mode, "breadth-first", is character- 
ized by exhausting each such subspace as soon as it is en- 
countered, thereby rapidly exhausting memory, without ever 
finding solutions. This is largely the behavior of SapaRe- 
plan (which does eager evaluation), with cost-based methods 
running out of memory (much sooner than the deadline, 30 
minutes) and size-based methods running out of time. So for 
SapaReplan it is the size-based methods that are performing 
many more re-expansions, as in a much greater amount of 
time they are failing to run out of memory. From the results, 
these re-expansions must be in a useful area of the search 
space. 

In particular it seems that the cost-based methods must 
indeed be exhausting the passenger-shuffling spaces more 
or less as soon as they are encountered — as otherwise it 
would be impossible to both consume all of memory yet fail 
to find better solutions. (Even with fuel there are simply too 
few distinct states modulo passenger-shuffling.) However, 
they do find solutions before getting trapped, in contradic- 
tion with theory. 

The explanation is just that the cost-based methods are run 
with large (5) heuristic weight, thereby introducing signifi- 
cant depth-first bias (but not nearly so significant as with pes- 
simistic delayed evaluation), so that it is possible for them to 
find a solution before attempting to exhaust such subspaces. 
It follows that they find solutions within seconds, and then 
spend minutes exhausting memory (and indeed that is what 
occurs). The size-based methods are run with small heuris- 
tic weight (2) as they tend to perform better in the long run 
that way. It would be more natural to use the same heuris- 
tic weight for both types, but, the cost-based approaches do 
conform to theory with small heuristic weights — producing 
no solutions, hardly an interesting comparison. 

A.4 Summary 

Both planners are capable of finding solutions to problems 
in the travel domain despite the use of a cost-based eval- 
uation function by using various tricks to induce a limited 



amount of depth-first behavior in an A* -framework. This 
has the potential effect of delaying exploration of the e-cost 
plateaus sUghtly, past the discovery of a solution, but still 
each planner is ultimately trapped by such plateaus before 
being able to find really good solutions. Then such tricks are 
mostly serving to mask the problems of cost-based search 
(and e-cost), as they merely delay failure by just enough that 
one can imagine that the planner is now effective (because 
it returns a solution where before it returned none). Using 
a size-based evaluation function more directly addresses the 
existence of cost plateaus, and not surprisingly leads to im- 
provement over the equivalent cost-based approach — even 
with LAMA. 



